Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Chinese short text classification method by combining semantic expansion and convolutional neural network
LU Ling, YANG Wu, YANG Youjun, CHEN Menghan
Journal of Computer Applications    2017, 37 (12): 3498-3503.   DOI: 10.11772/j.issn.1001-9081.2017.12.3498
Abstract520)      PDF (928KB)(870)       Save
Chinese news title usually consists of a single word to dozens of words. It is difficult to improve the accuracy of news title classification due to the problems such as few characters and sparse features. In order to solve the problems, a new method for text semantic expansion based on word embedding was proposed. Firstly, the news title was expanded into triples consisting of title, subtitle and keywords. The subtitle was constructed by combining the synonym of title and the part of speech filtering method, and the keywords were extracted from the semantic composition of words in multi-scale sliding windows. Then, the Convolutional Neural Network (CNN) model was constructed for categorizing the expanded text. Max pooling and random dropout were used for feature filtering and avoidance of overfitting. Finally, the double-word spliced by title and subtitle, and the multi-keyword set were fed into the model respectively. Experiments were conducted on the news title classification dataset of the Natural Language Processing & Chinese Computing in 2017 (NLP&CC2017). The experimental results show that, the classification precision of the combination model of expanding news title to triples and CNN is 79.42% in 18 categories of news titles, which is 9.5% higher than the original CNN model without expanding, and the convergence rate of model is improved by keywords expansion. The proposed expansion method of triples and the constructed CNN model are verified to be effective.
Reference | Related Articles | Metrics